fix: expand contact point hostnames to all DNS IPs at connection time (DRIVER-201) — Part 1/2#889
Draft
nikagra wants to merge 3 commits into
Draft
Conversation
… (DRIVER-201) Addresses the initial-contact-endpoints aspect of DRIVER-201. Problem: with RESOLVE_CONTACT_POINTS=false (the default), a contact point hostname was stored as a single unresolved InetSocketAddress. At connection time the load-balancing query plan contained exactly one Node per hostname, so only the first IP returned by DNS was ever tried. If that IP was non-responsive the driver raised AllNodesFailedException with no fallback to other IPs the hostname might resolve to. Solution (per @dkropachev's architectural direction): - Deprecate DefaultDriverOption.RESOLVE_CONTACT_POINTS. Contact points are now always kept as unresolved hostnames (resolve=false is hardcoded in SessionBuilder), deferring DNS expansion to connection time. - Add MetadataManager.getResolvedContactPoints(): for each contact point backed by an unresolved hostname it calls InetAddress.getAllByName() to expand the hostname to all known IPs, creating a synthetic DefaultNode for each IP. Already-resolved or non-InetSocketAddress endpoints pass through unchanged. - LoadBalancingPolicyWrapper now calls getResolvedContactPoints() instead of getContactPoints() in newQueryPlan() (BEFORE/DURING_INIT states) and newControlReconnectionQueryPlan(), so the query plan contains one node per resolved IP and the driver naturally falls back to the next IP when one is unreachable. Tests: - 4 new MetadataManagerTest cases covering null state, already-resolved passthrough, single-hostname expansion, and multi-endpoint expansion. - LoadBalancingPolicyWrapperTest updated to stub getResolvedContactPoints(). - New MockResolverIT.should_connect_when_first_dns_entry_is_non_responsive integration test: first DNS entry is a non-existent IP, session must open successfully against the remaining real IPs.
This was referenced May 15, 2026
…VER-201) newControlReconnectionQueryPlan() now creates copies of the original contact-point nodes (with their unresolved hostname endpoints) instead of synthetic nodes with resolved IPs. This ensures the control channel carries the hostname endpoint, which is preserved in metadata after topology refresh. DNS expansion for connection fallback is handled by ChannelFactory (PR scylladb#890), so the control-reconnection path does not need to inject resolved-IP nodes into the query plan. Also adds getContactPoints() stub back to LoadBalancingPolicyWrapperTest so tests that cover the control-reconnect path continue to pass.
Before-init query plan now uses getContactPoints() (original unresolved hostname nodes) instead of getResolvedContactPoints(). The DNS expansion to all IPs happens at the ChannelFactory level (PR scylladb#890), so expanding here was redundant and broke should_connect_with_mocked_hostname by replacing hostname endpoints with resolved-IP endpoints. Also remove the should_connect_when_first_dns_entry_is_non_responsive integration test from this PR; it belongs in PR scylladb#890 where ChannelFactory expansion actually enables it to pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When `RESOLVE_CONTACT_POINTS=false` (the default), a contact point hostname was stored as a single unresolved `InetSocketAddress`. At connection time the load-balancing query plan contained exactly one `Node` per hostname, so only the first IP returned by DNS was ever tried. If that IP was non-responsive the driver raised `AllNodesFailedException` with no fallback to other IPs the hostname might resolve to.
This is particularly impactful in dynamic DNS environments where a hostname maps to multiple nodes and the first one may be temporarily unavailable.
Fixes DRIVER-201.
Changes
`DefaultDriverOption` / `TypedDriverOption`
`SessionBuilder`
`MetadataManager`
`LoadBalancingPolicyWrapper`
Tests